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EXPRESS MAILING* EV221423098US 

FUZZY BASED NATURAL SPEECH CONCEPT SYSTEM 

[0001] This application claims priority to U.S. Provisional Application Serial 
No. 60/432,521, filed December 11, 2002. 

BACKGROUND OF THE INVENTION 

[0002] The present invention is mainly directed to a fuzzy utterance concept 
detection and conceptual grammar learning system. 

[0003] Automatic telephone conversation systems, which are activated in 
response to a user request through speaking into the telephone, are well known in the IT 
industry. A conversation system may contain automatic speech processing units such as a 
speech recognition engine (transferring speech to text), a TTS engine (transferring text to 
speech), a natural language understanding engine, a conversation flow management 
engine and a communication channel to business servers. The natural language 
understanding engine may further include a concept lexicon and a parser for grasping the 
intentions and indications contained in a user's utterance and for providing this 
information to the conversation system. 

[0004] Several known automatic telephone conversation systems include a 
natural language understanding system for utterance meaning detection. The natural 
language understanding system could consist of semantic lexicons, keyword lists and a 
parser for detecting the meanings represented by the keywords and their combinations. A 
conversation manager or controller, which is connected to one or a combination of these 
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parsers, controls the conversation flow and communication channels to business servers. 
In response to the detected meanings, one or more deployment aspects of the 
conversation system, such as the natural language generation and TTS engine, may be 
invoked. A telephone conversation system with natural speech understanding capabilities 
is commonly referred to as a "mixed initiative" conversational system. This type of 
systems is considered as having advantages to menu-driven systems. Specifically, if the 
user's intention and indication is broad and comes in free order, building a menu system 
would be impractical and it may be desirable to let the user speak freely than listening to 
a menu list. 

[0005] Grammar acquisition and concept understanding are key components 
of mixed initiative conversation systems. There are several types of such systems but 
many of them suffer from serious shortcomings. A system that classifies concepts based 
on a keyword list (and their aliases) may be misled if the word is mis-recognized, for 
instance. A system that classifies the concepts based on pre-defined speech templates 
may not be reliable as people may speak under different situations, in different styles and 
specificities. A system that relies solely on a pre-defined grammar cannot account for 
false recognitions due to the non-robustness of rule-based grammar parsing. Different 
noises such as mis-recognized words, re-phrasing, hesitation, false start, filler words, for 
instance, could fail the parser. Also, a partial parse-based system relying on semantic 
rules for re-assembling the meaning of the complete sentence suffers from the lack of 
information for creating sufficient semantic rules. 
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SUMMARY OF THE INVENTION 
[0006] In a user speech meaning detection system according to the present 
invention, errors due to user input complexity and recognizer problem are compensated 
for because the broad context is measured as a fuzzy set to which a correct concept 
belongs. This invention provides a simple yet reliable method to compensate for the 
missing factors to accurately classify concepts and determine the user's intention and 
indication. 

[0007] The present invention provides a novel fuzzy natural speech concept 
system that includes: (i) a concept classification and fuzzy conceptual grammar, (ii) a 
fuzzy concept grammar learning system, and (iii) a system for concept derivation from 
the speech of the user. 

[0008] In accordance with the preferred embodiment of the present invention, 
the fuzzy speech concept system and fuzzy conceptual grammar comprise: (a) one or 
more semantic lexicons, and (b) one or more natural speech corpora. 

[0009] As for the grammar learning and concept derivation modules, it 
comprises: (a) a concept classification unit, (b) a fuzzy concept grammar-learning unit, 
(c) a concept derivation unit, and (d) a testing and evaluation unit. These units work in 
certain order to form development cycles: First, with a given semantic lexicon and a 
natural speech corpus (transcripts of voice recordings), the concept classification unit 
generates a concept classification database specific to the corpus; Second, the grammar 
learning unit generates a fuzzy concept grammar; Third, the concept derivation unit 
applies the derived grammar to a set of test utterances; Fourth, the test and evaluation unit 
evaluates the performance of the system. Based on the evaluation, adjustments may be 
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made to the concept classification and the system "re-learns" the grammar. Once the 
development cycle is over, the system can be used as the natural language understanding 
engine in a telephone conversation system. 

[0010] The present invention has no restrictions on the type of semantic 
lexicon and natural speech corpora to be used. Any type of hierarchical semantic lexicon 
and raw text corpora can be used as long as they provide the system with the information 
of word classification and co-occurrence information. 



BRIEF DESCRIPTION OF THE DRAWINGS 

[0011] Other advantages of the present invention can be understood by 
reference to the following detailed description when considered in connection with the 
accompanying drawings wherein: 

[0012] FIG. 1 is a schematic block diagram of the fuzzy natural speech 
concept system (FNCS); 

[0013] FIG. 2 is a flow chart of the concept classification algorithm; 

[0014] FIG. 3 is a flow chart of the fuzzy concept grammar learning 
algorithm; 

[0015] FIG. 4 is a flow chart of the concept derivation algorithm; 
[0016] FIG. 5 is block diagram of the test evaluation algorithm. 
[0017] FIG. 6 is a schematic of a computer on which the FNCS of FIG. 1 can 
be implemented. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[0018] Referring to FIG. 1, a fuzzy natural language concept system (FNCS) 
includes one or more lexical databases 410, 412, 414, installed on a computing device, 
and they can be accessed in either reading or writing mode by any of the software 
modules. Any lexical database that meets certain specifications may be used. An example 
of such types of lexical database is the semantic lexicon WordNet, which provides a 
hierarchical classification of the English vocabulary. An example of a speech corpus is 
ATIS, which contains over twelve thousand of transcribed utterances in the air travel 
information domain. The fuzzy natural speech concept system (FNCS) also comprises a 
fuzzy concept grammar database 416 containing the results of the concept grammar- 
learning module 420. There are no restrictions on the type of database to be used as a 
grammar database. A possible candidate of such types of database is a Prolog database, 
for instance, containing clauses describing fuzzy sets in which a concept may belong. 

[0019] Given the lexical semantic information provided by database 410, and 
statistical information provided by database 412, the concept classification module 418 
classifies concepts in database 412 into domain specific categories and sends them to the 
concept grammar learning module 420. The concept classification system uses an 
algorithm to automatically detect the statistically significant concepts in the corpus and 
map words in the corpus to these concepts. The output of the module 420 is a fuzzy 
concept grammar 416. The fuzzy concept grammar 416 contains fuzzy inference rules, 
which assigns fuzzy membership to concepts using context vectors (left and right words 
of a concept in an utterance). The fuzzy concept grammar 416 is applied by the concept 
derivation module 422 to utterances in test corpus 414. Finally, a test and evaluation 
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module 424 calculates the success rates of the concept derivation. Depending on the 
evaluation results, a further development cycle may be initiated, by modifying the 
classification, increasing the training data, adjusting the parameters of the respective 
modules. Otherwise, the concept derivation module, together with the fuzzy concept 
grammar is delivered as the natural language understanding component of the automatic 
telephone conversation system. 

[0020] FIG. 2 provides a flow chart showing the algorithmic steps in the 
concept classification system, which is to decide whether a concept is significant to the 
domain in which the corpus is embedded. This is done through the statistical procedures 
514 and 520. In case a significant concept is detected, the system stores it in storage 522. 
This whole process is repeated for all the words in the training corpus, which are assigned 
multiple concepts by use of the semantic lexical database (ref. 410 of Fig. 1), taken from 
input 512. 

[0021] FIG. 3 is the flow chart depicting the system of concept grammar 
learning. The process starts from a preparation stage. Concept classes 612 derived from 
the concept classification module and train texts 614 are processed by a shallow parser 
618. The results are semantic phrases, which are stored in 620. The concept marking 
module 622 then marks the words of 620 with concepts from an annotated corpus sample 
616 and stores the results in storage 624. In the fuzzy grammar rules generation stage, the 
marked phrases are processed word by word. Test point 625 checks if a context word is a 
stop word and ignores it when it is the case. Otherwise, this context word is used to 
calculate (1) syntactic weights and (2) statistical parameters for a fuzzy concept rule, in 
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relation to an annotated concept by modules 628 and 630. The derived fuzzy concept 
rules are stored in 632. 

[0022] FIG. 4 depicts the top-level flow-chart of the concept derivation 
module, which accepts a sequence of words and derives the concepts intended by the 
speaker, by use of the fuzzy concept rules. At start sentence 712 and fuzzy rules 714 are 
input to module 716 in which the words are given possible concepts. At test point 720 
words surrounding the concept are examined one by one. When a context word is found, 
it is sent to fuzzy inference module 722 to assist the inference of a correct concept. The 
whole process checks all the words in the input sentence by the loop implemented with 
the test point 718 and the and stop point 724. The results of applying and inference with 
the fuzzy rules are stored in the storage 726, in the form of assigned concepts to words in 
the input sentence. It should be clarified at this point as to the difference between the 
matching results of module 716 and the inference results of the module 722: in the 
former, a word is matched to a number of "possible" concepts according to the previous 
learning; and in the later, one of the possible concepts is selected and assigned to the 
word by applying the inference rules to the context words surrounding the word in the 
sentence. 

[0023] FIG. 5 is a block diagram depicting the process of a fuzzy concept 
system development cycle. The fuzzy concept (grammar) rule is learned by module 820. 
The results of learning are tested with an independent test corpus 814 and the concept 
derivation module 816. The performance of the test is analyzed by evaluation module 
818. The test point 822 examines whether the performance has passed a threshold of 
accuracy. When the test has passed the accuracy requirement, the derived fuzzy rules can 
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be delivered to the telephone conversation system as the NLP module. Otherwise, more 
training is done by goes through the training cycle again to improve the system accuracy. 

[0024] FIG. 6 is a schematic for a computer 10 on which the fuzzy natural 
language concept system described above can be implemented. The computer 10 
includes a CPU 12, memory 14, such as RAM, and storage 16, such as a hard drive, 
RAM, ROM or any other optical, magnetic or electronic storage. The computer 10 
further includes an input 18 for receiving the speech input, such as over a telephone line, 
and an output 20 for producing the responsive speech output, such as over the telephone 
line. The computer 10 may also include a display 22. The algorithms, software and 
databases described above with respect to Figs. 1-5 are implemented on the computer 10 
and are stored in the memory 14 and/or storage 16. The computer 10 is suitably 
programmed to perform the steps and algorithms described herein. 

[0025] From the above description of a preferred embodiment of the 
invention, those skilled in the art will perceive improvements, changes and modifications. 
Such improvements, changes and modifications within the skill of the art are intended to 
be covered by the following claims. 
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